Overview of Topics

  • Introduction of the Problem

  • Statistics for Testing Equality of High Dimensional Covariance Matrices

  • Dimension Reduction via the Singular Value Decomposition

  • Simulation Study Setup

  • Results

  • Summary

  • Future Work

  • References

Introduction of the Problem

Introduction

  • Considering high-dimensional data

  • Five test statistics for testing equality of covariance matrices

  • Two dimension reduction methods via singular value decomposition

  • Interested in comparing equality of 2 and \(k\) covariance matrices

Testing Equality of Covariances

  • Let \(\boldsymbol{\Sigma}_i \in \mathbb{R}_{p \times p}^>\) be the covariance matrix of the \(i^\text{th}\) population for \(i = 1, 2, \dots, k\).

  • We wish to test

\[H_0 : \boldsymbol{\Sigma}_1 = \ldots = \boldsymbol{\Sigma}_k.\] - Our sample covariance matrices, \(\boldsymbol{S}_1, \ldots \boldsymbol{S}_k\), are distributed \(n_i \boldsymbol{S}_i \sim W_p(\boldsymbol{\Sigma}_i, n_i)\).

Modified Likelihood Ratio Test


\[M = n \log |\boldsymbol{S}| - \sum \limits^k_{i=1} n_i \log |\boldsymbol{S}_i| \xrightarrow{d} \chi^2\]


This modified likelihood ratio test is only valid when \(\boldsymbol{S}_i\) is nonsingular or when \(n_i \gg m\).

Wald Test


\[W = \frac{n}{2} \left \{ \sum \limits^k_{i = 1}\frac{n_i}{n} tr \left(\boldsymbol{S}_i\boldsymbol{S}^{-1}\boldsymbol{S}_i\boldsymbol{S}^{-1}\right) - \sum \limits^k_{i=1}\sum \limits^k_{j = 1}\frac{n_in_j}{n^2}tr \left(\boldsymbol{S}_i\boldsymbol{S}^{-1}\boldsymbol{S}_j\boldsymbol{S}^{-1}\right) \right \} \xrightarrow{d} \chi^2\]


This Wald test is only valid when \(\boldsymbol{S}\) is nonsingular or when \(n \gg m\).

Statistics for Testing Equality of
High Dimensional Covariance Matrices

Frobenius Norm

  • Ledoit and Wolf 2004 showed some nice properties of the Frobenius norm:

\[ d^2 = \frac{tr \left( \boldsymbol{\Sigma}_i - \boldsymbol{\Sigma}_j \right)^2 }{p} = \frac{tr \left( \boldsymbol{\Sigma}_i^2 \right) }{p} + \frac{tr \left( \boldsymbol{\Sigma}_j^2 \right) }{p} - \frac{2tr \left( \boldsymbol{\Sigma}_i \boldsymbol{\Sigma}_j \right) }{p}. \]

  • Dividing by \(p\) is not typical but allows the norm of the Identity to be 1.

  • The norm is invariant to rotation.

Schott 2007

  • Schott proposed a test in 2007 based on the Frobenius norm.


\[ T_{Sc} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{a}_{2i} + \hat{a}_{2j} - \frac{2}{p}tr \left( S_i S_j \right) \right) ^ 2}{\theta} \xrightarrow{d} \chi^2\]
\[\theta = 4 \hat{a}_2^2 \left( \sum \limits_{i < j}^ k \left( \frac{1}{n_i} + \frac{1}{n_j} \right) + (k - 1)(k - 2) \sum \limits_{i = 1}^k n_i^{-2} \right)\]

Schott 2007

  • The \(a_{ji}\) terms are consistent estimators of \(\frac{tr\left( \boldsymbol{\Sigma}^j\right)}{p}\).


\[\hat{a}_{2i} = \frac{tr \left( \boldsymbol{V}_i^2 \right) - \frac{1}{n_i}tr \left( \boldsymbol{V}_i \right)^2}{ \left( n_i - 1 \right) \left(n_i + 2 \right)p} \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}_i^2 \right) }{p}\]

\[\hat{a}_2 = \frac{tr \left( \boldsymbol{V}^2 \right) - \frac{1}{n}tr \left( \boldsymbol{V} \right)^2 }{(n - 1)(n + 2)p} \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}^2 \right) }{p}\]

Chaipitak and Chongcharoen 2013

  • Chaipitak and Chongcharoen proposed a test in 2013 based on a distance estimator \(b = tr \left( \boldsymbol{\Sigma}^2_i \right) / tr \left( \boldsymbol{\Sigma}^2_j \right)\)

\[T_{C} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{b} - 1 \right)^2 }{\hat{\delta}^2} \xrightarrow{d} \chi^2\]

\[\hat{b} = \frac{\hat{a}_{2i}}{\hat{a}_{2j}} \quad \quad \hat{\delta}^2 = 4 \left( \frac{2\hat{a}_4}{p \hat{a}_2^2} \sum \limits^k_{i=1} \frac{1}{n_i -1} + \sum \limits^k_{i=1} \frac{1}{ \left( n_i - 1 \right)^2 } \right)\]

Srivastava et al. 2014

  • Srivastava et al. proposed a test in 2014 that had the same form as Schott 2007 but used an unbiased consistent estimator for \(\hat{a}_{2i}\).


\[ T_{S14} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{a}_{2i} + \hat{a}_{2j} - \frac{2}{p}tr \left( \boldsymbol{S}_i \boldsymbol{S}_j \right) \right) ^ 2}{\theta} \xrightarrow{d} \chi^2\]

Srivastava et al. 2014


\[\hat{a}_{2i} = \frac{ \left(n_i -2 \right) \left( n_i -1 \right) tr \left( \boldsymbol{V}_i^2 \right) - n \left( n - k \right) tr \left( \boldsymbol{D}^2_i \right) + tr \left( \boldsymbol{V}_i \right)^2 }{pn_i \left( n_i -1 \right) \left( n_i -2 \right) \left( n_i -3 \right) } \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}^2 \right) }{p}\]


<<<<<<< HEAD \[\boldsymbol{A}_i = \boldsymbol{X}_i^T\boldsymbol{X}_i \quad \quad \boldsymbol{D}_i = diag \left( a_{ij}\right)\] ======= \[\boldsymbol{A}_i = \boldsymbol{X}_i\boldsymbol{X}_i^T \quad \quad \boldsymbol{D}_i = diag \left( a_{ij}\right)\] >>>>>>> 62bf5090aa462a46e5d665f950e237a9b3cb73fe

Ishii et al. 2016

  • Ishii et al proposed their test in 2016 using ratios of the first eigenvalues and first eigenvectors.


\[T_I = \prod \limits^{k}_{i < j} \tilde{\lambda}_* \tilde{h}_* \tilde{\gamma}_* \xrightarrow{d} F\]


  • Components of the test use noise reduced values since the largest eigenvalue is biased upwards with high dimensional data.

Ishii et al. 2016

  • The first noise reduced eigenvalue is solved by using the sum of the other non zero eigenvalues of the covariance matrix.


\[\tilde{\lambda}_* = \frac{max(\tilde{\lambda}_{i1}, \tilde{\lambda}_{j1})}{min(\tilde{\lambda}_{i1}, \tilde{\lambda}_{j1})}\]


\[\tilde{\lambda}_{i1} = \hat{\lambda}_{i1} - \frac{tr(S_i) - \hat{\lambda}_{i1}}{n_i - 2}\]

Ishii et al. 2016

  • The noise reduced first eigenvector is solved using the first noise reduced eigenvalue.


\[\tilde{h}_* = max(|\tilde{h}^T_i \tilde{h}_j|, |\tilde{h}^T_i \tilde{h}_j|^{-1})\]

\[\tilde{h}_i \]

Ishii et al. 2016

  • The final part of the test consists of the sum of all the other non zero eigenvalues of the covariance matrix.


\[\tilde{\gamma}_* = max(\frac{\tilde{\kappa}_i}{\tilde{\kappa}_j}, \frac{\tilde{\kappa}_j}{\tilde{\kappa}_i})\]


\[\tilde{\kappa}_i = tr(S_{i}) - \tilde{\lambda}_{i1}\]

Dimension Reduction via the Singular Value Decomposition

Data Scatter Matrix

  • For samples \(\boldsymbol{X}_i\), let \(\boldsymbol{X} = \left[ \boldsymbol{X}_1 \vdots \ \ldots \ \vdots \boldsymbol{X}_i\vdots \ \ldots \ \vdots \boldsymbol{X}_k \right]^T\).

  • Let \(\boldsymbol{M}\) represent the scatter matrix of \(\boldsymbol{X}\)

  • Let \(\boldsymbol{M} = \boldsymbol{UDU}^T\) represent the singular value decompostion of \(\boldsymbol{M}\).

  • Partition \(\boldsymbol{U}\) such that \(\boldsymbol{U} = \left[ \boldsymbol{U}_1 \vdots \, \boldsymbol{U}_2 \right]\) with \(\boldsymbol{U}_1 \in \mathbb{R}^{p \times q}\), where \(q = 1, 2, \dots , p\).

  • Project \(\boldsymbol{X}_i\) to \(q\) dimensions by taking \(\boldsymbol{X}_{Ri}^T = \boldsymbol{U}_1^T \boldsymbol{X}_i^T\).

Covariance Difference Concatination Matrix

  • For groups \(\boldsymbol{X}_i\) let \(\boldsymbol{S_i}\) be the sample covariance matrix.

  • Let \(\widehat{\boldsymbol{M}} := \left[\boldsymbol{S}_2 - \boldsymbol{S}_1 \vdots \ldots \vdots \boldsymbol{S}_i - \boldsymbol{S}_1 \vdots \ldots \vdots \boldsymbol{S}_k - \boldsymbol{S}_1 \right]\) where \(i = 1, \ldots, k\) groups.

  • Let \(\widehat{\boldsymbol{M}} = \boldsymbol{UDV}^T\) represent the singular value decompostion of \(\widehat{\boldsymbol{M}}\).

  • Partition \(\boldsymbol{U}\) such that \(\boldsymbol{U} = \left[ \boldsymbol{U}_1 \vdots \, \boldsymbol{U}_2 \right]\) with \(\boldsymbol{U}_1 \in \mathbb{R}^{p \times q}\).

  • Project \(\boldsymbol{X}_i\) to \(q\) dimensions by taking \(\boldsymbol{X}_{Ri}^T = \boldsymbol{U}_1^T \boldsymbol{X}_i^T\).

Simulation Study Setup

Critical Value Simulation

The critical value data sets generated for the simulation study have the following characteristics:

  • \(n_i = 15\)

  • \(p = 100\)

  • We generated the \(k\) populations from \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i \right)\).

  • The number of repetitions is 100,000.

  • \(cv = inf\{x \in \mathbb{R} : 1 - \alpha \leq \hat{F}_{T}(x) \}\), where \(\alpha = .05\).

Power Simulation

The power simulation data sets generated for the simulation study have the following characteristics:

  • \(n_i = 15\)

  • \(p = 100\)

  • We generated the \(k\) populations from \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i \right)\), and \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i * c \right)\).

  • Number of repetitions is 1,000.

\[Power = \frac{\sum \limits^{1000}_{i = 1} I(T \geq cv)}{1000}\]

Results

Two Populations

Toeplitz Covariance No Dimension Reduction

<<<<<<< HEAD

======= >>>>>>> 62bf5090aa462a46e5d665f950e237a9b3cb73fe

Toeplitz Covariance Dimension Reduction

<<<<<<< HEAD

=======

Toeplitz Covariance Dimension Reduction

>>>>>>> 62bf5090aa462a46e5d665f950e237a9b3cb73fe

Toeplitz Covariance at a Frobenius Norm of 0.01

<<<<<<< HEAD

=======

Toeplitz Covariance at a Frobenius Norm of 0.01

>>>>>>> 62bf5090aa462a46e5d665f950e237a9b3cb73fe

Spiked Covariance No Dimension Reduction

<<<<<<< HEAD

Spiked Covariance Dimension Reduction

Spiked Covariance at a Frobenius Norm of 0.7

Three Populations

Toeplitz Covariance No Dimension Reduction

Toeplitz Covariance Dimension Reduction

Toeplitz Covariance at a Frobenius Norm of 0.1

=======

Spiked Covariance No Dimension Reduction

Spiked Covariance Dimension Reduction

Spiked Covariance at a Frobenius Norm of 0.7

Three Populations

Toeplitz Covariance No Dimension Reduction

Toeplitz Covariance Dimension Reduction

Toeplitz Covariance at a Frobenius Norm of 0.1

>>>>>>> 62bf5090aa462a46e5d665f950e237a9b3cb73fe

Spiked Covariance No Dimension Reduction

Spiked Covariance Dimension Reduction

Spiked Covariance at a Frobenius Norm of 0.9

Summary and References

Summary

  • We saw improvements in power with dimension reduction.

  • It appears the high dimension tests don't do that well with spiked covariance matrices and dimension reduction method doesn't seem to help that much.

  • Dimension reduction with the modified likelihood ratio test improved the power and appeared to beat the high dimensional tests when dealing with a spiked covariance matrix.

Future Work

  • Explore some of the asymptotic properties of the tests little better.

  • Why does the Data Scatter dimension reduction method appear to do better than the covariance differences method.

  • Looking at a another nontrivial k \((k \neq 3)\) group situation.

  • Look at the Wald test with dimension reduction.

  • Seeing how shrinkage estimators perform with these tests.

  • Looking at one sample high dimensional tests.

References

Schott (2007), Srivastava, Yanagihara, and Kubokawa (2014), Ishii, Yata, and Aoshima (2016), Chaipitak and Chongcharoen (2013), Ledoit and Wolf (2004)

Chaipitak, Saowapha, and Samruam Chongcharoen. 2013. “A Test for Testing the Equality of Two Covariance Matrices for High-Dimensional Data.” Journal of Applied Sciences 13 (2): 270–77.

Ishii, Aki, Kazuyoshi Yata, and Makoto Aoshima. 2016. “Asymptotic Properties of the First Principal Component and Equality Tests of Covariance Matrices in High-Dimension, Low-Sample-Size Context.” Journal of Statistical Planning and Inference 170 (March): 186–99.

Ledoit, Olivier, and Michael Wolf. 2004. “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices.” Journal of Multivariate Analysis 88 (2): 365–411.

Schott, James R. 2007. “A Test for the Equality of Covariance Matrices When the Dimension Is Large Relative to the Sample Sizes.” Computational Statistics & Data Analysis 51 (12): 6535–42.

Srivastava, Muni S., Hirokazu Yanagihara, and Tatsuya Kubokawa. 2014. “Tests for Covariance Matrices in High Dimension with Less Sample Size.” Journal of Multivariate Analysis 130 (September): 289–309.